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(57) Abstract 



A video encoding method and apparatus for adapting a video input to a bandwidth of a transmission channel of a network tot 
includes determining the number N enhancement layer bitstreams capable of being adapted to the bandwidth of the transmission channel 
of a network. A base layer bitstream is encoded from the video input wherein a plurality of enhancement layer bitstreams are encoded 
from the video input The enhancement layer bitstreams are based on the base layer bitstream, wherein the plurality of enhancement layer 
bitstreams complements the base layer bitstream and the base layer bitstream and N enhancement layer bitstreams are transmitted to the 
network. 
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wo 00/05898 PCT/US99/16d38 
SCALABLE VIDEO CODING AND DECODING 

BACKGROUND OF THE TNVFNTTmy 

FfeM ffftHn^ InventjioBi 

The present invention relates to a method and apparatus for the scaling of data 
5 signals the bandwidth of the transmission channel; and more particularly to a scalable 
video method and apparatus for codmg video such that the received video is adapted 
to the bandwidth of the transmission channel, 

Pmriiptiiftw Qf Rdated Art 

10 

Signal compression in the video arena has long been employed to increase the 
bandwidth of either the generating, transmitting, or receiving device. MPEG - an 
acronym for Movmg Picture Experts Group - refers to the family of digital video 
compression standards and file formats developed by the group. For instance, the 

15 MPEG-1 video sequence is an ordered stream of bits, with special bit patterns marking 
the beginning and ending of a logical section. 

MPEG achieves high compression rate by storing only the changes from one 
frame to another, instead of each entire frame. The video information is then encoded 
using a technique called DCT (Discrete Cosine Transform) which is a technique for 

20 representing a waveform data as a weighted sum of cosines. MPEG use a type of 
lossy compression wherein some data is removed. But the diminishment of data is 
genially imperceptible to the human eye. It should be noted that the DCT itself does 
not lose data; rather, data compression technologies that rely on DCT approximate 
some of the coefficients to reduce the amount of data. 

25 The basic idea behind MPEG video compression is to remove spatial 

redundancy within a video frame and temporal redundancy between video frames. The 
DCT-based (Discrete Cosine Transform) compression is used to reduce spatial 
redundancy and motion compensation is used to exploit temporal redundancy. The 
images in a video stream usually do not change much within small tune intervals. 
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Thus, the idea of motion-compensation is to encode a video frame based on other 
video frames temporally close to it 

A video stream is a sequence of video frames, each frame being a still image. 
A video player displays one frame after another, usually at a rate close to 30 frames per 
second. Macroblocks are formed, each macroblock consists of four 8x8 luminance 
blocks and two 8x8 chrominance blocks. Macroblocks are the units for 
motion-compensated compression, wherein blocks are basic imit used for DCT 
compression. Frames can be encoded in three types: intra-frames (I-frames), forward 
predicted frames (P-frames), and bi-directional predicted frames (B-frames). 

An I-frame is encoded as a single image, with no reference to any past or ftiture 
frames. Each 8x8 block is encoded independentiy, except that the coeflBcient in the 
upper left comer of tiie block, called the DC coefiBcient, is encoded relative to the DC 
coefficient of tiie previous block. The block is first transfonned from the spatial 
domain into a fi^uency domain using the DCT (Discrete Cosine Transfonn), which 
separates the signal into independent frequency bands. Most frequency mfomiation is 
in the upper left comer of tiie resulting 8x8 block. After tiie DCT coefficients are 
produced the data is quantized, i.e. divided or separated. Quantization can be thought 
of as ignoring lower-order bits and is the only los^ part of the whole compression 
process other than sub-sampling. 

The resulting data is then run-lengtii encoded in a zig-zag ordering to optimize 
compression. The zig-z^ ordering produces longer runs of O's by taking advantage of 
tiie fact tiiat tiiere should be little high-frequency information (more 0*s as one zig-zags 
from the upper left comer towards the lower right comer of the 8x8 block). 

A P-frame is encoded relative to the past reference fi^e. A reference frame is 
aP-orl-fi^e. The past reference frame is tiie closest preceding reference frame, A 
P-macroblock is encoded as a 1 6 x 16 area of tiie past reference frame, plus an error 
term. 

To specify tiie 16 x 16 area of tiie reference frame, a motion vector is included. 
A motion vector (0, 0) means tiiat tiie 16 x 16 area is in tiie same position as tiie 
macroblock we are encoding. Otiier motion vectors are generated are relative to that 
position. Motion vectors may include half-pixel values, in which case pixels are 
averaged. The error term is encoded using tiie DCT, quantization, and run-lengtii 
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encoding. A macroblock may also be skipped ^ch is equivalent to a (0, 0) vector 
and an all-zero error term. 

A B-fiame is encoded relative to the past reference fiame, the future reference 
fiame, or both frames. 

A pictorial view of the above processes and techniques in application are 
depicted in prior art Fig. 15, which illustrates the decoding process for a SNR 
scalability. Scalable video coding means coding video in such a way that the quality of 
a received video is adapted to the bandwidth of the transmission channel. Such a 
coding technique is very desirable for transmitting video over a network with a time- 
varying bandwidth. 

SNR scalability defines a mechanism to refine the DCT coefficients encoded in 
another Oower) layer of a scalable hierarchy. As illustrated in prior art Fig. 1 5, data 
fix)m two bitstreams is combined after the inverse qiiantization processes by adding 
the DCT coefficients, Until the dat is combined, flie decoding processes of the two 
layers are independrat of each other. 

The lower layer (base layer) is derived Srom the first bitstream and can itself be 
either non-scalable, or reqmre the spatial or temporal scalability decoding process, and 
hence the decoding of additional bitstream, to be applied. The enhancement layer, 
derived from the second bitstream, contains mainly coded DCT coefficients and a small 
overhead. 

In the current MPEG-2 video coding standard, there is an SNR scalability 
ejrtension that allows two levels of scalability. MPEG achieves high compression rate 
by storing only the changes from one fi^e to another, instead of each entire frame. 
There are at least two disadvantages of employing the MPEG-2 standard for encodmg 
video data. One disadvantage is that the scalabiUty granularity is not fine enough, 
because the MPEG-2 process is an all or none method. Either the receiving device can 
receive all of the data fix)m the base layer and the enhancement layer or only the data 
from the base layer bitstream. Therefore, the granularity is not scalable. In a network 
envkonment, more than two levels of scalability are usually needed. 

Another disadvantage is that the enhancement layer coding in MPEG-2 is not 
efficient. Too many bits are needed in the enhancement layer in order to have a 
noticeable increase in video quality. 



wo 00/05898 



PCTAJS99/16638 



The present invention overcomes these disadvantages and others by providing, 
among other advantages, an efficient scalable video coding method with increased 
granularity. 

SUMMARY OF THF TNVFNTIf^^ 

The present invention can be characterized as a scalable video coding means 
and a system for encoding video data, such that quality of the final image is gradually 
unproved as more bits are received. The improved quality and scalability are achieved 
by a method wherein an enhancement layer is subdivided into layers or levels of 
bitstream layers. Each bitstream layer is capable of carrying information 
complementary to the base layer information, in that as each of the enhancement layer 
bitstreams are added to the corresponding base layer bitstreams the quality of the 
resulting images are unproved. 

The number N of enhancement layers is determined or limited by the network 
that provides the transmission channel to the destination point. While the base layer 
bitstream is always transmitted to the destination point, the same is not necessarily true 
for the enhancement layers. Each layer is given a priority coding and transmission is 
effectuated according to the priority coding. In the event that all of the enhancement 
layers cannot be transmitted the lower priority coded layers will be omitted. The 
omission of one or more enhancement layers may be due to a multitude of reasons. 

For instance, the server which provides the transmission channel to ibt 
destination point may be experiencing large demand on its resources from other users, 
in order to try and acconunodate all of its users the server will prioritize the data and 
only transmit the higher priority coded packets of information. The transmission 
channel may be the limiting factor because of the bandwidth of the channel, i,e. 
Intemet access port, Etiiemet protocol, LAN, WAN, twisted pair cable, co-axial cable, 
etc. or the destination device itself, i.e. modem, absence of an enhanced video card, 
etc. may not be able to receive the additional bandwidth made available to it. In these 
instances only M number (M is an integer number = 0, 1 , 2, . . .) of enhancement layers 
may be received, wherein N number (N is an integer number = 0, 1, 2, . . .) of 
enhancement layers were generated at the encoding stage, M < N. 
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To achieve these and other advantages and in accordance with the purpose of 
the present invention, as embodied and broadly described, the scalable video method 
and apparatus according to one aspect of the invention includes a video encoding 
mettiod for adapting a video input to a bandwidth of a transmission channel of a 
network, the method includes determining the number N of enhancement layer 
bitstreams capable of being adapted to the bandwidth of the transmission channel of 
the network. Encoding a base layer bitstream from the video input is then performed 
and encoding N number of enhancement layer bitstreams from the video input based on 
the base layer bitstream, wherein the plurality of enhancement layer bitstreams 
complements the base layer bitstream. The base layer bitstream and the N 
enhancement layer bitstreams are then provided to the network. 

According to another aspect of the present invention, a video decoding method 
for adapting a video input to a bandwidth of a transmission channel of a network 
mcludes, detennining number M of enhancement layer bitstreams of said video input 
capable of being received from said transmission channel of said networic Decoding a 
base layer bitstream fix>m received video input and decoding M number of 
enhancement layer bitstreams from the received video input based on the base layer 
bitstream, wherein the M received enhancement layer bitstreams complements the base 
layer bitstream. Then reconstructing the base layer bitstream and N enhancement layer 
bitstreams. 

According to still another aspect of the present invention, a video decoding 
method for adapting a video input to a bandwidth of a receiving apparatus, the method 
includes demultiplexmg a base layer bitstream and at least one of a plurality of 
enhancement layer bitstreams received from a network, decodmg the base layer 
bitstream, decoding at least one of the plurality of enhancement layer bitstreams based 
on generated base layer bitstream, wherein the at least one of the plurality of 

enhancement layer bitstreams enhances the base layer bitstream. Then reconstructing a 
video output 

According to a fiirther aspect of the present invention, a video encoding 
method for encoduig enhancement layers based on a base layer bitstream encoded from 
a video input, the video encoding method includes, taking a difference between an 
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original DCT coeflBcient and a reference point and dividing the difference between the 
original DCT coefBcient and the reference point into N bit-planes. 

According to a still further aspect of the present invention, a method of coding 
motion vectors of a plurality of macroblocks, includes determining an average motion 
vector from N motion vectors for N macroblocks, utilizing Ae determined average 
motion vector as the motion vector for the N macroblocks, and encoding 1/N motion 
vectors in a base layer bitstream. 

Additional features and advantages of the invention will be set forth in the 
description which follows, and in part will be apparent from the description, or may be 
learned by practice of the invention. The aspects and other advantages of the invention 
will be realized and attained by the structure particularly pointed out in the written 
description and claims hereof as well as the appended drawings. 

It is to be understood that both the foregoing general description and the 
following detailed description are exemplary and explanatory and are intended to 
provide further explanation of the invention as clahned. 

BRIEF DESCRIPTION OF THE PR AWTNns 

The accompanying drawings, vMch are mcluded to provide a further 
understanding of the invention and are incorporated in and constitute a part of this 
specification, illustrate embodiments of the invention and together with the description 
serve to explain the principles of the invention. In the drawings: 

Fig. 1 illustrates a flow diagram of the scalable video encoding method of the 
present invention; 

Fig. 2 A illustrates conventional probability distribution of DCT coefBcient 

values; 

Fig. 2B illustrates conventional probability distribution of DCT coeflBcient 
residues; 

Fig. 3A iUustrates the probability distribution of DCT coefBcient values of the 
present invention; 

Fig. 3B illustrates the probability distribution of DCT coefficient residues of the 
present invention; 
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Figs. 3C and 3D illustrates a metiiod for taking a difference of a DCT 
coefBcient of the present invention; 

Fig. S illustrates a flow diagram for finding the maximiun number of bit-planes 
in the DCT differences of a frame of the present invention; 
5 Fig. 6 illustrates a flow diagram for generating (RUN, EOF) Symbols of the 

present invention; 

Fig. 7 Illustrates a flow diagram for encoding enhancement layers of the 
present invention; 

Fig. 8 illustrates a flow diagram for encoding (RUN, EOF) symbols and 
1 0 sign_enh values of one DCT block of one bit-plane; 

Fig. 9 illustrates a flow diagram for encodmg a sign_enh value of the present 
invention; 

Fig. 1 0 illustrates a flow diagram for adding enhancement difference to a DCT 
coefBcient of the present invention; 
1 5 Fig. 1 1 illustrates a flow diagram for converting enhancement difference to a 

DCT coefBcient of the present invention; 

Fig. 12 illustrates a flow diagram for decoding enhancement layers of the 
present invention; 

Fig. 13 illustrates a flow diagram for decoding (RUN, EOF) symbols and 
20 sign_enh values of one DCT block of one bit-plane; 

Fig. 14 illustrates a flow diagram for decoding a sign_enh value; and 
Fig. 15 illustrates a prior a conventional SNR scalability flow diagram. 

DETAILED DESCRIPTION OF THE PRFFF RRED RMBOmMFNfT f^ 
25 Reference will now be made m detail to the preferred embodiments of the 

present invention, examples of v^ch are illustrated in the accompanying drawings. 

Fig. 1 illustrates the scalable video diagram 10 of an embodiment of the present 
invention. The original video input 20 is encoded by the base layer encoder 30 in 
accordance with the method of represent by flow diagram 400 of Fig. 4. A DCT 
30 coefBcient OC and its corresponding base layer quantized DCT coefficient QC are 
generated and a difference determined pursuant to steps 420 and 430 of Fig. 4. The 
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difference inforaiation from the base layer encoder 30 is passed to the enhancement 
layer encoder 40 that encodes the enhancement mformation. 

The encoding of the enhancement layer encoder is performed pursuant to 
methods 500 • 900 as depicted in Figs. 5-10, respectively and will be briefly 
described. The bitstream from the base layer encoder 30 and the N bitstreams from the 
enhancement layer encoder 40 are capable of being sent to the transmission channel 60 
by at least two methods. 

In the first method all bitstreams are multiplexed together by multiplexor 50 
with different priority identifiers, e.g., the base layer bitstream is guaranteed, 
enhancement bitstream layer 1 provided by enhancement layer encoder 40 is given a 
higher priority than enhancement bitstream layer 2. The prioritization is continued 
until all N (wherem N is an integer from 0, 1, 2, ... ) of the bitstreams layers are 
prioritized. Logic in the encoding layers 30 or 40 in negotiation with the network and 
mtermediated devices detranine the number N of bitstream layers to be generated. 

The number of bitstream layers generated is a function of the total possible 
bandwidth of the transmission channel 60, i.e. Ethernet, LAN, or WAN connections 
(this list is not intended to exhaustive but only representation of potential limiting 
devices and/or equipment), and the network and other intermediate devices. The 
number of bitstream layers M (wherein M is an integer and M < N) reaching the 
destination point 100 can be further Ihnited by not just the physical constraints of the 
intermediate devices but the congestion on the network, thereby necessitating the 
dropping of bitstream layers according to their priority. 

Li a second method the server 50 knows the transmission channel 60 condition, 
i.e. congestion and other physical constraints, and selectively sends the bitstreams to 
the channel according to the priority identifiers. Li either case, the destmation point 
100 receives the bitstream for the base layer and M bitstreams for the enhancement 
layer, where M<N. 

The bitstreams M are sent to the base layer 90 and enhancement layer 80 
decoders after bemg demultiplexed by demultiplexor 70. The decoded enhancement 
information from the enhancement layer decoder is passed to the base layer decoder to 
composite the reconstructed video output 100. The decoding of the multiplexed 
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bitstreams are accomplished pursuant to the methods and algorithms depicted in flow 
diagrams 11 00 - 1400 of Figs. 11-14, respectively. 

The base layer encoder and decoder are capable of performing logic pursuant 
to the MPEG-1, MPEG-2, or MPEG-4 (Version-l) standards that are hereby 
incorporated by reference into this disclosure. 

Taking Residue with Probability Distribution Preserved 

A detailed description of the probability distribution residue will now be made 
with reference to Figs 2A - 3B 

In the current MPEG-2 signal-to-noise ratio (SNR) scalability extension, a 
residue or difference is taken between the origmal DCT coefBcient and the quantized 
DCT coefBcient. Fig. 2A illustrates the distribution of a residual signal as a DCT 
coefficient In taking the residue small values have higher probabilities and large 
values have smaller probabilities. The intervals along the horizontal axis represent 
quantization bms. The dot in the center of each interval represents the quantized DCT 
coefficient. Taking the residue between the original and the quantized DCT coefficient 
is equivalent to moving the origin to the quantization point 

Therefore, the probability distribution of the residue becomes that as shown in 
Figure 2B. The residue fi-om the positive side of Fig. 2A has a higher probability of 
being negative than positive and the residue taken from the negative side of the Fig. 2A 
has a higher probability of being positive than negative. The result is that the 
probability distribution of the residue becomes ahnost uniform. Thus making coding 
the residue more difficult. 

A vastly superior method is to generate a difference between the original and 
flie lower boundaiy points of the quantized interval as shown in Fig. 3A and Fig. 3B. 
In this method, the residue is taken from the positive side of Fig. 2A remains positive 
and the residue from the negative side of Fig. 2 A remains negative. Taking the residue 
is equivalent to moving the origin to the reference point as illustrated in Fig. 3 A. Thus, 
the probability of the residue becomes as shown in Fig. 3B. This method preserves the 
shape of the original non-uniform distribution. Although the dynamic range of the 
residue taken in such a manner seems to be twice of that depicted in Fig. 2B, their is 
no longer a need to code the sign, i.e. - or +, of the residue. The sign of the residue is 
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encoded in the base layer bitstream corresponding the enhancement layer, therefore 
this redundancy is eliminated and bits representing the sign are thus saved. Therefore, 
there is only a need to code the magnitude that still has a nonuniform distribution. 

Bit Plane coding of residual DCT cocfficii>iite 

After taking residues of all the DCT coeflScients in an 8 x 8 block, bit plane 
coding is used to code the residue. In bit-plane coding method the bit-plane coding 
method considers each residual DCT coefficient as a binary number of several bits 
instead of as a decimal integer of a certain value as in the run-level coding method. 
The bit-plane coding method in the present invention only replaces runlevel coding 
part Therefore, all the other syntax elements remain the same. 

An example of and description of the bit-plane coding method will now be 
made, wherein 64 residual DCT coeflScients for an Inter-block and 63 residual DCT 
coefficients for an Intra-block (excluding the Intra-DC component that is coded using 
a separate method) are utilized for the example. The 64 (or 63) residual DCT 
coeflBcients are ordered into a one-dimensional array and at least one of the residual 
coefficients is non-zero. The bit-plane coding method then performs the following 
steps. 

The maximum value of all the residual DCT coefficients in a frame is 
determmed and the minimum number of bits, N, needed to represent the maximum 
value in the binary format is also determined, N is the number of bitplanes layers for 
this fi-ame and is coded in the fi-ame header. 

Within each 8x8 block is represent every one of the 64 (or 63) residual DCT 
coefficients with N bits in the binary format and there is formed N bit-planes or layers 
or levels. A bit-plane is defined as an array of 64 (or 63) bits, taken one from each 
residual DCT coefficient at the same significant position. 

The most significant bit-plane is determined with at least one non-zero bit and 
then the number of all-zero bit-planes between the most significant bit-plane 
determined and the Nth one is coded. Then starting from the most significant bit plane 
(MSB plane), 2-D symbols are fomied of two components: (a) number of consecutive 
O's before a I (RUN), (b) whether there are any Ts left on this bit plane, i.e. End-Of- 
Plane (EOP). If a bit-plane after the MSB plane contains all O's, a special symbol 
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ALL-ZERO is foimed to represent an aU-zero bit-plane. Note that the MSB plane 
does not have the all-zero case becaiise any all-zero bit-planes before the MSB plane 
have been coded in the previous steps. 

Four 2-D VLC tables are used, wherein the table VT-C-Table-0 corresponds to 
the MSB plane; table VLC-Table- 1 corresponds to the second MSB plane; table VLC- 
Table-2 corresponds to the third MSB plane; and table VLC-Table-3 corresponds to 
the foxirth MSB and all the lower bit planes. For the ESCAPE cases, RUN is coded 
with 6 bits, EOP is coded with 1 bit. Escape coding is a method to code very small 
probability events which are not in the coding tables individually. 

An example of the above process will now follow. For illustration purposes, 
we will assume that the residual values after the zigzag ordering are given as follows 
and N = 6: The following representation is thereby produced. 

10, 0, 6, 0, 0, 3, 0, 2, 2, 0, 0, 2, 0, 0, 1 , 0, ... 0, 0 

The maximum value in this block is found to be 1 0 and the minimum number of 
bits to represent 10 in the bmaiy format (1010) is 4. Therefore, two all-zero bit-planes 
before the MSB plane are coded with a code for the value 2 and the remaining 4 bit- 
planes are coded using the (RUN, EOP) codes. Writing eveiy value in the binary 
format using 4 bits, the 4 bit-planes are formed as follows: 

1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0 (MSB-plane) 

0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0 (Second MSB-plane) 

1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0,0, 0 (Thh-d MSB-plane) 

0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,0, 0 (Fourth MSB-plane or LSB-plane) 

Converting the bits of each bit-plane into (RUN, EOP) symbols results in the 
following: 



(0, 1) 
(2, 1) 

(0, 0), (1,0), (2,0), (1,0), (0, 0), (2, 1) 



(MSB-plane) 
(Second MSB-plane) 
(Third MSB-plane) 
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(5, 0), (8, 1) (Fourth MSB-plane or LSB-plane) 

Therefore, there are 10 symbols to be coded using the (RUN, EOF) VLC 
tables. Based on their locations in the bit-planes, different VLC tables a» used for the 
coding. The enhancement bitstteam using all four bitplanes looks as foUom: 
code leadmg-aIl-zero(2) 
code msb(0, 1) 
code msb-l(2,l) 

code-msb-2(0,0), code_msb-2(l,0), code-msb-2(2,0), code-msb-2(l,0), code-msb- 
2(0,0), code-msb-2(2, 1) code_msb-3(5,0), code_msb-3(8, 1). 

In an alternative embodiment, several enhancement bitstieams may be formed 
from the four bit-planes, in this example from the respective sets comprising one or 
more of the four bit-planes. 

Motion Vector Sharing 

In this alternative embodiment of the present invention motion vector sharing is 
capable of being utilized when the base layer bitstream exceeds a predetemiined size or 
more levels of scalability are needed for the raihancement layer. By lowering the 
number of bits required for coding the motion vectors in the base layer the bandwidth 
requirements of the base layer bitstream is reduced. In base layer coding, a 
macroblock (16 x 16 pixels for the luminance component and W pixels for each chron- 
luminance components) of the current frame is compared with the previous frame 
within a search range. The closest match in the previous frame is used as a prediction 
of the current macroblock. The relative displacement of the prediction to the current 
macroblock, in the horizontal and vertical directions, is called a motion vector. 

The difference between the current macroblock and it's prediction is coded 
using the DCT coding. In order for the decoder to reconstruct the current 
macroblock, the motion vector has to be coded in the bitstream. Since there is a fixed 
number of bits for coding a frame, the more bits spent on coding the motion vectors 
results in fewer bits for coding the motion compensated differences. Therefore, it is 
desirable to lower the number of bits for coding the motion vectors and leave more bits 
for coding the differences between the cunrent macroblock and its prediction. 



-12- 



wo 00/05898 



PCTAJS99/16638 



For each set of 2 x 2 motion vectors, the average motion vector can be 
determined and used for the four macrobiocks. In order to not change the syntax of 
tiie base layer coding, four macrobiocks are forced to have the identical motion 
vectors. Since only one out four motion vectors is coded in the bitstream, the amount 
5 of bits spent on motion vector coding is reduced, therefore, there are more bits 

available for coding the dififerences. The cost for pursuing such a method is that the 
four macrobiocks, which share the same motion vector may, not get the best matched 
prediction individually and the motion compensated difference may have a larger 
dynamic range, thus necessitating more bits to code the motion vector. 
1 0 For a given fixed bitrate, the savings fi*om coding one out of four motion 

vectors may not compensate the increased number of bits required to code the 
difference with a larger dynamic range. However, for a time varying bitrate, a wider 
dynamic range for the enhancement layer provides more flexibility to achieve the best 
possible usage of the available bandwidth. 

15 

In an alternative embodiment of the present invention, if the base layer 
quantized DCT coefiBcient is non-zero, the corresponding enhancement layer 
difference will have the same sign as the base layer quantized DCT. Therefore, there is 
20 no need to code the sign bit in the enhancement layer. 

Conversely, if the base layer quantized DCT coefficient is zero and 
corresponding enhancement layer difference is non-zero, a sign bit is placed into 
oshancement layer bitstream inunediately after the MSB of the difference. 
An example of the above metiiod will now follow. 

25 

Difference of a DCT block after ordering 

- 10, 0, 6, 0, 0, 3, 0, 2, 2, 0, 0, 2, 0, 0, 1, 0, ...0, 0 

Sign indications of the DCT block after ordering 

- 3, 3, 3, 3, 2, 0, 3, 3, 1, 2, 2, 0, 3, 3, 1, 2, ... 2, 3 

30 - 0: base layer quantized DCT coefiBcient = 0 and difference >0 

- 1 : base layer quantized DCT coefficient = 0 and difference <0 

- 2: base layer quantized DCT coefficient = 0 and difference =0 
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-3: base layer quantized DCT coefficient = 0. 

In this example, the sign bits associated with values 10, 6, 2 don't need to be 
coded and the sign bits associated with 3, 2, 2, 1 are coded in the following way: 
Code(AllZero) 
code (All Zero) 
code(0,l) 
code(2,l) 

code(0,0),code(l,0),code(2,0),0,code(l,0),code(0,0),l,code(2,l),0 
code(5,0),code(8,l),l 

For every DCT difference, there is a sign indication associated with it. There 
are four possible cases. In the above coding 0, 1 , 2, and 3 are used to denote the four 
cases. If the sign indication is 2 or 3, the sign bit does not have to be coded because it 
is either associated with a zero difference or available from the corresponding base 
layer data. If the sign indication is 0 or 1 a sign bit code is required once per difference 
value, i.e. not eveiy bit-plane of the difference value. Therefore, a sign bit is put 
immediately after the most significant bit of the difference. 

Optimal Reconstruction of the DCT Cneffigii^iitiB 

In an alternative embodiment of the present invention, even though N 
enhancement bitstream layers or planes may have been generated, only M, wherein M 
< N enhancement layer bits are available for reconstruction of the DCT coefiBcients 
due to the channel capacity, and other constraints such as congestion among others, 
the decoder 80 of Fig. 1 may receive no enhancement difference or only a partial 
enhancement difference. In such a case, the optimal reconstruction of the DCT 
coefficients is capable of proceeding along the following method: 

If decoded difference = 0, the reconstruction point is the same as that in base 
layer, otherwise, the reconstructed difference = decoded difference + Va 
*(l«decoded_bit_plane) and the reconstruction point = reference pomt + 
reconstructed difference * Q_enh +Q_enh/2. 

In the present embodiment, referring to Figs. 3C and 3D, the optimal 
reconstruction point is not the lower boundary of a quantization bin. The above 
mediod specifies how to obtain the optimal reconstruction point in cases where the 
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difference is quantized and received partially, i.e. not all of the enhancement layers 
generated are either transmitted or received as shown in Fig. 1 . wherein M < N. 
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What is claimed is: 

1 . A video encoding method for adapting a video input to a bandwidth of a 
transmission channel of a network, the method comprising the steps of: 

determining number N of enhancement layer bitstreams capable of being 

adapted to said bandwidth of said transmission channel of said network; 

encoding a base layer bitstream fiom said video input; 

encoding N number of enhancement layer bitstreams from said video 

input based on the base layer bitstream, wherein the 

N enhancement layer bitstreams complements the base layer bitstream; and 

providing the base layer bitstream and N enhancement layer bitstreams to said 
network. 

2. The video encoding method according to claim 1 , wherein the 
determining step mcludes negotiatmg with mtermediate devices on said 
network. 

3. The video encoding method according to claim 2, wherein 
negotiating includes determining destination resources. 

4. The video encoding method according to claim 1 , wherein the step of 
encoding the base layer bitstreams is performed by a MPEG-1 encoding 
method. 

5. The video encoding method accordmg to claim 1, wherein the step of 
encoding the base layer bitstreams is performed by a MPEG-2 encodmg 
method. 

6. The video encoding method accordmg to claun 1 , wherein the step of 
encodmg the base layer bitstreams is performed by a MPEG-4 encoding 
method. 
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7. The video encoding method according to claim 1 , wherein the step of 
encoding the base layer bitstreams is performed by a Discrete Cosine 
Transform (DCT) method. 

8. The video encoding method according to claim 7, wherein after 
encoding the base layer bitstreams by a Discrete Cosine Transform (DCT) 
method a DCT coefficient is quantized. 

9. The video encoding method according to claim 1 , wherein the enhancement 
layer bitstreams are based on the difference of an original base layer DCT 
coefficient and a corresponding base layer quantized DCT coefiicient 

1 0. The video encoding method according to claim 1 , wherein the base 
layer bitstream and the N enhancement layer provide to the network are 
multiplexed. 



11. A video decoding method for adapting a video input to a bandwidth of a 
transmission chaimel of a network, the method comprising the steps of: 

determining number M of enhancement layer bitstreams of said video input 

capable of being received from said transmission channel of said 

network; 

decoding a base layer bitstream from received video input; 

decoding M number of enhancement layer bitstreams from the received video 
input based on the base layer bitstream, wherem the M received 
enhancement layer bitstreams complements the base layer bitstream; 

and 

reconstructing the base layer bitstream and N enhancement layer bitstreams. 



12. The video decoding method according to claim 1 1 , wherein the 

determining step includes negotiating with intermediate devices on said 
network. 
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13. The video decoding method according to claim 12, wherein 
negotiating includes determining destination resources. 

14. The video decoding method according to claun 11, \is^erein the step of 
decoding the base layer bitstreams is performed by a MPEG-1 decoding 
method. 

15. The video decoding method according to claim 1 1 , wherein the step of 
decoding the base layer bitstreams is performed by a MPEG-2 decoding 
method. 

16. The video decoding method according to claim 1 1, wherein the step of 
decoding the base layer bitstreams is performed by a MPEG-4 decoding 
method. 

1 7. The video decoding method according to claim 1 1 , vsiierein the step of 
decoding the base layer bitstreams is performed by a Discrete Cosine 
Transform (DCT) method. 

1 8. The video decoding method according to claim 1 7, wherein after 
decoding the base layer bitstreams by a Discrete Cosine Transform (DCT) 
method a DCT coefficient is imquantized. 

19. The video decoding method according to claim 1 1 , wherein coding of the 
enhancement layer bitstreams are based on the difference of an original base 
layer DCT coefficient and a corresponding base layer quantized DCT 
coefiBcient. 

20. The video decodmg method according to claim 1 1 , wherein the base 
l^er bitstream and the M enhancement layers to be reconstructed are de- 
multiplexed. 



-18- 



wo 00/05898 



PCTAJS99/16638 



21. A video decoding method for adaqpting a video input to a band\vddft of a 
receiving apparatus, the method comprising the steps of: 

demultiplexing a base layer bitstream and at least one of a plurality of 
enhancement layer bitstreams received from a network; 
decoding the base layer bitstream; 

decoding at least one of the plurality of enhancement layer bitstreams based 

on generated base layer bitstream, wherein the at least one of the plurality of 
enhancement layer bitstreams enhances the base layer bitstream; and 
reconstructing a video output. 

22. A video encoding method for encoding enhancement layers based on a base 
layer bitstream encoded from a video mput, the video encoding method comprismg the 
steps of: 

taking a difference between an original DCT coefficient and a reference point; 

and 

dividing the difference tetween the original DCT coefficient and the reference 
point into N bit-planes. 

23. The video encoding method according to claim 22, wherein RUN and EOP 
symbols represents the N bit-planes of a DCT block. 

24. The video encoding method according to claim 23, wherein the RUN and EOP 
symbols are encoded. 

25. The video encoding method according to claim 24, v^^erein a sign bit is 
encoded if the DCT difference is equal to zero or the sign of flie DCT difference is the 
same as the corresponding base layer bitstream data 
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26. A video decoding method for reconstructing DCT coefficients M enhancement 
layers of N oihancement layers have been received, wherein M < N, comprising: 

means for taking a reconstruction difiference as a decoded difference and a 
portion of a decoded bit-plane; 

means for taking a reconstruction pomt as a reference point and a 
reconstructed differrace; and 
determining an optimal reconstruction point 

27. A method of coding motion vectors of a plurality of macroblocks, the method 
comprising the steps of: 

determining an average motion vector from N motion vectors for N 
macroblocks; 

utilizing the determined average motion vector as the motion vector for the N 

macroblocks; and 
encoding 1/N motion vectors in a base layer bitstream. 
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* See Figure 3. 
Example: 

If Base Layer quantization is 
AQC = A0C/(2*Q) 
lower boundary is AQC * (2*Q) 
optimal point is AQC*(2*Q) + Q 
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